pacman::p_load(tmap, sf, DT, stplanr, tidyverse)Hands-on Ex 9A
Processing & Visualising Flow Data
The hands-on exercise on Modelling Geographical Accessibility is in Hands-on Ex 10
1. Overview
Spatial interaction (SI) refers to the movement of people, goods, or information between locations, covering activities like freight shipments, energy flows, global trade, commutes, and pedestrian traffic. Defined broadly, SI is the flow of individuals, commodities, capital, and information across space, often influenced by decision-making (Fotheringham & O’Kelly, 1989). The key principle of SI is that individuals or entities weigh the benefits of interaction (such as commuting to work or migrating) against the costs of overcoming spatial distance (Fischer, 2001), a core concept for understanding spatial behavior.
Each interaction forms an origin/destination (OD) pair, which can be organized in an OD matrix, mapping origins to destinations. In this exercise, we will build an OD matrix using Passenger Volume by Origin Destination Bus Stops data from LTA DataMall to explore SI in practical applications.
The following R packages are used, with stplanr being the key package used for transport planning and modeling. It provides tools to download and clean transport data, map “desire lines” (the direct routes people prefer to take between locations), assign routes for travel (including options for cycling routes), calculate details about each route, like direction and traffic flow, and analyze areas reachable within specific travel times.
2. Importing & Preparing Data
We will import and work with three datasets:
- Bus Commuters by Origin/Destination data from LTA DataMall
- Bus Stop Locations: Data on bus stop locations as of the last quarter of 2022.
- MPSZ-2019: Sub-zone boundary data from the URA Master Plan 2019.
odbus <- read_csv("data/aspatial/origin_destination_bus_202210.csv")Rows: 5122925 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): YEAR_MONTH, DAY_TYPE, PT_TYPE
dbl (4): TIME_PER_HOUR, ORIGIN_PT_CODE, DESTINATION_PT_CODE, TOTAL_TRIPS
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(odbus)Rows: 5,122,925
Columns: 7
$ YEAR_MONTH <chr> "2022-10", "2022-10", "2022-10", "2022-10", "2022-…
$ DAY_TYPE <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 10, 10, 7, 11, 16, 16, 20, 7, 7, 11, 11, 8, 11, 11…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <dbl> 65239, 65239, 23519, 52509, 54349, 54349, 43371, 8…
$ DESTINATION_PT_CODE <dbl> 65159, 65159, 23311, 42041, 53241, 53241, 14139, 9…
$ TOTAL_TRIPS <dbl> 2, 1, 2, 1, 1, 4, 1, 3, 1, 5, 2, 5, 15, 40, 1, 1, …
Since ORIGIN_PT_CODE and DESTINATION_PT_CODE are in numeric data type, as.factor() function is used to convert it to character/factor data type.
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) For this exercise, we will focus on weekday commuting flows between 6:00 and 9:00 am. The eventual dataset is the total number of trips between origin and destination bus stops.
odbus6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
datatable(odbus6_9)Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html
We use st_transform() to transform the projection to CRS 3414.
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/Hands-on_Ex/Hands-on_Ex09/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
We use st_transform() to transform the projection to CRS 3414.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/Hands-on_Ex/Hands-on_Ex09/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
mpszSimple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 MARINA EAST MESZ01 MARINA EAST ME CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 JURONG ISLAND AND BUKOM WISZ01 WESTERN ISLANDS WI WEST REGION
5 FORT CANNING MUSZ02 MUSEUM MU CENTRAL REGION
6 MARINA EAST (MP) MPSZ05 MARINE PARADE MP CENTRAL REGION
7 SUDONG WISZ03 WESTERN ISLANDS WI WEST REGION
8 SEMAKAU WISZ02 WESTERN ISLANDS WI WEST REGION
9 SOUTHERN GROUP SISZ02 SOUTHERN ISLANDS SI CENTRAL REGION
10 SENTOSA SISZ01 SOUTHERN ISLANDS SI CENTRAL REGION
REGION_C geometry
1 CR MULTIPOLYGON (((33222.98 29...
2 CR MULTIPOLYGON (((28481.45 30...
3 CR MULTIPOLYGON (((28087.34 30...
4 WR MULTIPOLYGON (((14557.7 304...
5 CR MULTIPOLYGON (((29542.53 31...
6 CR MULTIPOLYGON (((35279.55 30...
7 WR MULTIPOLYGON (((15772.59 21...
8 WR MULTIPOLYGON (((19843.41 21...
9 CR MULTIPOLYGON (((30870.53 22...
10 CR MULTIPOLYGON (((26879.04 26...
As the flow analysis is at the subzonal level, we will need to retrieve the planning subzone code (SUBZONE_C) for each bus stop. The st_intersection() function performs a point-in-polygon overlay, resulting in a point sf dataframe that matches each bus stop to its respective subzone. Next, select() is used to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz data frame.
Note that five bus stops are excluded from the result as they fall outside Singapore’s boundary.
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C) %>%
st_drop_geometry()Warning: attribute variables are assumed to be spatially constant throughout
all geometries
The code below found that there were duplicated bus stop records.
duplicate <- busstop_mpsz %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate# A tibble: 24 × 2
BUS_STOP_N SUBZONE_C
<chr> <chr>
1 82221 GLSZ05
2 82221 GLSZ05
3 22501 JWSZ09
4 22501 JWSZ09
5 96319 TMSZ05
6 96319 TMSZ05
7 43709 BKSZ07
8 43709 BKSZ07
9 68091 SLSZ04
10 68091 SLSZ04
# ℹ 14 more rows
We use unique() to de-duplicate the records.
busstop_mpsz <- unique(busstop_mpsz)datatable(busstop_mpsz)Next, we append the planning subzone code from busstop_mpsz dataframe to the origin bus stops from odbus6_9 dataframe.
od_data <- left_join(odbus6_9 , busstop_mpsz,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_SZ = SUBZONE_C,
DESTIN_BS = DESTINATION_PT_CODE)Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 103964 of `x` matches multiple rows in `y`.
ℹ Row 161 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
We can use the code below to check for duplicated records.
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate# A tibble: 0 × 4
# ℹ 4 variables: ORIGIN_BS <chr>, DESTIN_BS <fct>, TRIPS <dbl>, ORIGIN_SZ <chr>
If there are duplicated records, we use unique() to de-duplicate them.
od_data <- unique(od_data)Now, we append the planning subzone code from busstop_mpsz dataframe to the destination bus stops from odbus6_9 dataframe.
od_data <- left_join(od_data , busstop_mpsz,
by = c("DESTIN_BS" = "BUS_STOP_N")) Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 11263 of `x` matches multiple rows in `y`.
ℹ Row 1379 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
We can use the code below to check for duplicated records.
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicate# A tibble: 0 × 5
# ℹ 5 variables: ORIGIN_BS <chr>, DESTIN_BS <chr>, TRIPS <dbl>,
# ORIGIN_SZ <chr>, SUBZONE_C <chr>
If there are duplicated records, we use unique() to de-duplicate them.
od_data <- unique(od_data)
glimpse(od_data)Rows: 223,468
Columns: 5
Groups: ORIGIN_BS [4,995]
$ ORIGIN_BS <chr> "1012", "1012", "1012", "1012", "1012", "1012", "1012", "101…
$ DESTIN_BS <chr> "1112", "1113", "1121", "1211", "1311", "7371", "60011", "60…
$ TRIPS <dbl> 168, 88, 77, 65, 172, 8, 26, 1, 26, 21, 2, 8, 1, 10, 11, 3, …
$ ORIGIN_SZ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ SUBZONE_C <chr> NA, NA, NA, NA, NA, NA, "KLSZ08", "KLSZ08", "KLSZ02", "KLSZ0…
In the final step, we remove rows with null values in either the origin or destination subzone, then aggregate the total number of trips for each origin-destination subzone pair.
od_data <- od_data %>%
rename(DESTIN_SZ = SUBZONE_C) %>%
drop_na() %>%
group_by(ORIGIN_SZ, DESTIN_SZ) %>%
summarise(MORNING_PEAK = sum(TRIPS))`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.
glimpse(od_data)Rows: 14,734
Columns: 3
Groups: ORIGIN_SZ [279]
$ ORIGIN_SZ <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01…
$ DESTIN_SZ <chr> "AMSZ01", "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMSZ06…
$ MORNING_PEAK <dbl> 1998, 8289, 8971, 2252, 6136, 2148, 1620, 1925, 1773, 63,…
3. Visualising Spatial Interaction
In this section, we will visualise “desire lines” (the routes people prefer to take between locations) using the stplanr package.
3.1 Remove intra-zonal flows
This study excludes intra-zonal flows, so they are removed from the analysis.
od_data_fij <- od_data[od_data$ORIGIN_SZ != od_data$DESTIN_SZ,]3.2 Create inter-zonal desire lines
We use the od2line() function from the stplanr package to create inter-zonal desire lines.
flowLine <- od2line(flow = od_data_fij,
zones = mpsz,
zone_code = "SUBZONE_C")Creating centroids representing desire line start and end points.
3.3 Visualise desire lines
The map below shows inter-zonal bus commuter flows on weekdays between 6:00 and 9:00 am.
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5)Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

When inter-zonal bus commuter flows are complex or highly skewed, it can be effective to focus on selected flows, such as those with values greater than or equal to 5,000, as shown below.
tm_shape(mpsz) +
tm_polygons() +
flowLine %>%
filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
tm_lines(lwd = "MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5)Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length
